Using Feature Selection to Find Inputs That Work Better as Extra Outputs

نویسندگان

  • Rich Caruana
  • Virginia R De Sa
چکیده

In supervised learning there is usually a clear distinction between inputs and outputs | inputs are what you measure, outputs are what you predict from those measurements. The distinction between inputs and outputs is not this simple. Previously, we demonstrated that on synthetic problems some input features are more useful when used as extra outputs than when used as inputss6]. This paper shows the same eeect on a real problem, and presents a means of determining what features can be used as extra outputs. We show that the feature selection method devised by Koller and Sahamii11] can be used to select features to use as extra outputs, and that using some features as as extra outputs instead of as inputs yields better performance on the DNA splice-junction domain. 1 MOTIVATION The goal in supervised learning is to learn functions that map inputs to outputs with high predictive accuracy. The common practice in backprop nets is to use all features that will be available for test cases as inputs, and use as outputs only features that need to be predicted. On real problems, where there may be many redundant or irrelevant features, using all the available features as inputs is often suboptimal. Many algorithms learn better given a carefully selected subset of the features to use inputss3, 10, 11]. If feature selection is used to nd the features to use as inputs, what should be done with the features not selected? Usually, features not selected for use as inputs are discarded. But, there are other ways to beneet from features without using them as inputs. One way to beneet from features not used as inputs is multitask learning. Multitask learning (MTL) is an inductive transfer method where extra tasks are learned in parallel with the main task while using a shared representation. Because the extra tasks share a hidden layer with the main task, internal representations learned for the extra tasks can be used by the main task outputs, often improving performance on the main task. MTL in backprop nets is well documentedd13, 1, 2, 4, 8, 9, 7]. Most applications of MTL are to problems where some features available for the training set will not be available for future test casess5]. We recently demonstrated that there are problems where some features that could be used as inputs would be 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Benefitting from the Variables that Variable Selection Discards

In supervised learning variable selection is used to find a subset of the available inputs that accurately predict the output. This paper shows that some of the variables that variable selection discards can beneficially be used as extra outputs for inductive transfer. Using discarded input variables as extra outputs forces the model to learn mappings from the variables that were selected as in...

متن کامل

Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs

In supervised learning there is usually a clear distinction between inputs and outputs inputs are what you will measure, outputs are what you will predict from those measurements. This paper shows that the distinction between inputs and outputs is not this simple. Some features are more useful as extra outputs than as inputs. By using a feature as an output we get more than just the case values...

متن کامل

Feature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm

This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a  structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the  measure...

متن کامل

Short term load forecast by using Locally Linear Embedding manifold learning and a hybrid RBF-Fuzzy network

The aim of the short term load forecasting is to forecast the electric power load for unit commitment, evaluating the reliability of the system, economic dispatch, and so on. Short term load forecasting obviously plays an important role in traditional non-cooperative power systems. Moreover, in a restructured power system a generator company (GENCO) should predict the system demand and its corr...

متن کامل

Neuro-Fuzzy Based Algorithm for Online Dynamic Voltage Stability Status Prediction Using Wide-Area Phasor Measurements

In this paper, a novel neuro-fuzzy based method combined with a feature selection technique is proposed for online dynamic voltage stability status prediction of power system. This technique uses synchronized phasors measured by phasor measurement units (PMUs) in a wide-area measurement system. In order to minimize the number of neuro-fuzzy inputs, training time and complication of neuro-fuzzy ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998